elgoonishshivefandomcom-20200214-history
Template talk:EGS-title/Archive 1
Source code Here, if anyone is interested: https://docs.google.com/file/d/0B62bIXPmp62eM0hWN3QyNnQyajg/edit?usp=docslist_api -- HarJIT (talk) 06:34, May 5, 2015 (UTC) :Thanks. Definitely simpler than trying to copy Ookii database myself. -- Hkmaly (talk) 01:09, May 7, 2015 (UTC) I've made some drastic adjustments/improvements. Here is the latest source download: https://drive.google.com/file/d/0B62bIXPmp62ec0pJMHl4UHpzdE0/view?usp=sharing It is probably worth noting that you should read rebuild.bat in WordPad or Notepad++ (not Notepad as it borks on UNIX line endings) before running it. In particular, delete the following: :@rem Certain details are stored in local mirror of EGS, extract them. :@rem This will almost certainly only work on my system, hence :@rem files are not stored in build dir. :C:\Python27\python extract_date2id.py > Date2Id.txt :C:\Python27\python extract_newfiles.py > NewFiles.txt :C:\Python27\python bg_extract_title_db.py > BgNames.txt If you do not delete these lines, the relevant files will be overwritten with empty files and you will have to download the archive again. These files are important, including the date-to-id mapping. -- HarJIT (talk) 18:27, May 18, 2015 (UTC) :I'm pretty sure using geany, joe, mcedit, pico or vim would be ok as well. Actually, Notepad might be only editor with this bug. -- Hkmaly (talk) 00:13, May 19, 2015 (UTC) I've updated it further, up-to-date link here: https://drive.google.com/file/d/0B62bIXPmp62eSVN6Mm5zUkhaQnM/view?usp=sharing -- HarJIT (talk) 10:09, May 23, 2015 (UTC) :Finally got to parsing that. You may want to update it even more and use json (or yaml) instead of repr. Would be more portable. Also, there are like 10 characters outside 7-bit ASCII in the MegaDB, but you still managed to output half of them with incorrect encoding (the ookii is in utf-8, but the transcripts aren't, generating Principal Verrückt incorrectly multiple times, and the \xe9 for Animé is actually in source code). Also, the transcripts are off by one comics in several cases, like, 2002-02-08 ... not sure if your input data are incorrect or if it's bug in code. -- Hkmaly (talk) 17:13, June 7, 2015 (UTC) ::Yes... still much work to be done. I've fixed a few issues with the use of my spidered information, making the up to date link now https://docs.google.com/file/d/0B62bIXPmp62eQkwtQklIQkI2MGc/edit?usp=docslist_api but I haven't updated the titles template yet and, as you mention, there are still many issues. And the coding is ... organic, which visibly grew from an initial idea rather than being planned out, a case in point being the substantial code duplication between merger, sber, nper, sp2, nsb, nnp, to say nothing of the filenames ;). -- HarJIT (talk) 07:16, June 8, 2015 (UTC) :::Filenames are irrelevant ... code duplication is worse, code reuse is the best method to reduce number of bugs in code. But still, the quality of output is the most important issue. ... maybe except version control system. With four published version, you should think about using some version control system (svn, git, ...) ideally some with free public server available (github). On the other hand, most of your package are data ... -- Hkmaly (talk) 00:00, June 9, 2015 (UTC) :::(It's true that I don't have any public code repository, but I'm also not brave enough to publish my code ... I'm mostly using the fact it's hardcoded to my system as an excuse ... but so is yours and you published it anyway.) :::BTW, I personally don't like python, but people claim it has lot of useful libraries. I doubt wget is best way to download something from python, and you should consider using sqlite database instead of files. -- Hkmaly (talk) 00:10, June 9, 2015 (UTC) ::::Hmm... I am actually developing it as part of a local Git VCS. The machine I'm developing it on isn't normally connected to the internet due to arrangements in my household, so while I can upload zips to Drive on any machine such as my low-end Android device, pushing to Github would prove difficult. (I ran the recent spiders on my (rooted) Android device.) ::::I think I was (a year ago, running Python off a USB drive on a family laptop, trying to download offline copies of the EGS and xkcd back archives for personal reference) having some trouble with urllib, hence I settled on wget. ::::Per code reuse, I am presently working on (gradually) delegating recurring code into an imported module, and have actually managed to obsolete nnp and nsb with a much-more-generalised sp2. -- HarJIT (talk) 20:53, June 9, 2015 (UTC) Update, for what it's worth: https://drive.google.com/file/d/0B62bIXPmp62eSkU2MlBabUVWbEU/view?usp=sharing -- HarJIT (talk) 07:36, June 10, 2015 (UTC) :And again, with even more merging: https://docs.google.com/file/d/0B62bIXPmp62eUUZHclpVVkppa00/edit?usp=docslist_api . And the JSON format for the final output databases (after I managed to work around the encoding issues which the JSON library was borking on). -- HarJIT (talk) 20:54, June 10, 2015 (UTC) ::... what "arrangements in my household" can prevent connecting machine to internet? Limit for ethernet cable is 100m ... but of course, if you don't have internet access pushing to Github will be hard and local git should be good enough. ::Yes, JSON requires consistent encoding, how did you think I found out those issues :-)? I see your solution is almost exactly same as mine. -- Hkmaly (talk) 02:09, June 11, 2015 (UTC) Another update; this time the MegaDb format has actually changed somewhat, with all sections in a single file, and with a distinction between arcs and lines: https://drive.google.com/file/d/0B62bIXPmp62ecjY2bEJ4VDI1ZzQ/view?usp=sharing -- HarJIT (talk) 07:32, June 15, 2015 (UTC) Again: https://drive.google.com/file/d/0B62bIXPmp62eMGNFSGNJSHk4REE/view?usp=sharing (BG incorporated into main database). -- HarJIT (talk) 20:51, June 26, 2015 (UTC) Source code for another update I'm about to roll out: https://drive.google.com/file/d/0B62bIXPmp62eVEtnVEJfQ2FQN0U/view?usp=sharing -- HarJIT (talk) 16:51, August 11, 2015 (UTC) Updated/corrected a little, although this particular template is unaffected: https://drive.google.com/file/d/0B62bIXPmp62edmFjOXRna0N1Rmc/view?usp=sharing -- HarJIT (talk) 20:34, August 18, 2015 (UTC) Mostly what I've done this time is work on sorting out the licensing. The spiders were derived from some third-party code under the "MIT license" (presumbaly the Expat Licence) which I had not complied with by including the appropriate licence notice (I had obtained it long before, initially for use as an XKCD grabber, and hadn't kept track of the original source), and my own work didn't actually state any permission terms. I added the appropriate notices (plus a link to the original), and put my work under a slightly clarified edit of the Copyfree GAL (with the exception of a small number of files which I released into the public domain). I also separated some of the data (as opposed to code) from the Python scripts into a "titlebank" module. https://drive.google.com/file/d/0B62bIXPmp62ealB5LUNOSlBjMDg/view?usp=sharing Note that I have taken the previous versions offline, so as to cease violating the licence terms of the third-party code. -- HarJIT (talk) 21:26, November 28, 2015 (UTC) Some loose ends tied up: https://drive.google.com/file/d/0B62bIXPmp62eMm14UW1UYWZKN00/view?usp=docslist_api -- HarJIT (talk) 18:10, November 29, 2015 (UTC) Okay, take two. Actually license the exporters for the title templates themselves from the database (blushes profusely) - actually, I am releasing those particular exporters into the public domain so as to bypass the issue of licensing compatibility for the incorporated substantial template documentation string: https://drive.google.com/file/d/0B62bIXPmp62eUmJCRDlGSElvd3M/view?usp=docslist_api -- HarJIT (talk) 21:22, December 8, 2015 (UTC) Update sorting out IronPython compatibility, in case anyone is bothered: https://drive.google.com/file/d/0B62bIXPmp62eNjludDVQdGNnVHc/view?usp=docslist_api -- HarJIT (talk) 22:53, December 15, 2015 (UTC) Many code changes here. Let's start with the main point: it passes data between routines using primary memory in many cases, reducing overhead from disk access: https://drive.google.com/file/d/0B62bIXPmp62eX1NpNFVMWk5FVFU/view?usp=docslist_api -- HarJIT (talk) 11:30, December 19, 2015 (UTC) Latest, up to date link: https://drive.google.com/file/d/0B62bIXPmp62eWkM4UXo2YWxoT3M/view?usp=docslist_api -- HarJIT (talk) 07:23, December 20, 2015 (UTC) Somewhat more optimised, and handles HTML entity escapes better (also the permission notice wording has been edited to more closely match that of Zlib in certain places): https://drive.google.com/file/d/0B62bIXPmp62ed0xqWVdJUjE0cVU/view?usp=sharing -- HarJIT (talk) 11:31, December 31, 2015 (UTC)